Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Intrinsic curiosity method based on reward prediction error

Qing TAN, Hui LI, Haolin WU, Zhuang WANG, Shuchao DENG

Journal of Computer Applications 2022, 42 (6): 1822-1828. DOI: 10.11772/j.issn.1001-9081.2021040552

Abstract （373）

HTML （9）

PDF （2455KB）（181）

Save

Concerning the problem that when the state prediction error is directly used as the intrinsic curiosity reward， the reinforcement learning agent cannot effectively explore the environment in the task with low correlation between state novelty and reward， an Intrinsic Curiosity Module with Reward Prediction Error （RPE-ICM） was proposed. In RPE-ICM， the Reward Prediction Error Network （RPE-Network） model was used to learn and correct the state prediction error reward， and the output of the Reward Prediction Error （RPE） model was used as an intrinsic reward signal to balance over-exploration and under-exploration， so that the agent was able to explore the environment more effectively and use the reward to learn skills to achieve better learning effect. In different MuJoCo （Multi-Joint dynamics with Contact） environments， comparative experiments were conducted on RPE-ICM， Intrinsic Curiosity Module （ICM）， Random Network Distillation （RND） and traditional Deep Deterministic Strategy Gradient （DDPG） algorithm. The results show that compared with traditional DDPG， ICM-DDPG and RND-DDPG， the DDPG algorithm based on RPE-ICM has the average performance improved by 13.85%， 13.34% and 20.80% respectively in Hopper environment.

Table and Figures | Reference | Related Articles | Metrics